De novo computational prediction of non-coding RNA genes in prokaryotic genomes

نویسندگان

  • Thao T. Tran
  • Fengfeng Zhou
  • Sarah Marshburn
  • Mark Stead
  • Sidney R. Kushner
  • Ying Xu
چکیده

MOTIVATION The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues. RESULTS We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E.coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E.coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation. AVAILABILITY The source code and results are available at http://csbl.bmb.uga.edu/publications/materials/tran/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Next-Generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011

The availability of next-generation sequences of transcripts from prokaryotic organisms offers the opportunity to design a new generation of automated genome annotation tools not yet available for prokaryotes. In this work, we designed EuGene-P, the first integrative prokaryotic gene finder tool which combines a variety of high-throughput data, including oriented RNA-Seq data, directly into the...

متن کامل

From Structure Prediction to Genomic Screens for Novel Non-Coding RNAs

Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNA...

متن کامل

A Compression-Based Approach for Coding Sequences Identification. I. Application to Prokaryotic Genomes

Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of...

متن کامل

Pathway Analysis of miRNA-1 and Its Expres-sion Evaluation in Donor’s Serum from HIV-Positive Individuals vs Unaffected Controls

Background MicroRNAs (miRNAs) are non-coding RNA molecules (19-24 nucleotides) that play a major role in a wide range of biological processes through post-transcriptional regulation of gene expression. Differential expression of miRNAs has been reported in various infectious diseases such as HIV infection. The characterization of miRNA expression profiles, especially in mammalian biofluids, whi...

متن کامل

Eukaryotic Gene Prediction

Introduction: The advent of large-scale genome sequencing has revolutionized the field of genetics and biology. Sequencing projects require sophisticated computational analysis to manage vast collections of data. Scientists first sequenced a genome in 1977, that of a small bacteriophage consisting of 11 genes over 5.4kb of DNA. In the bacteriophage, coding genes comprise 95% of the genome. 1 Si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2009